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Parallel  Restructuring  and  Evaluation  of 

Expressions1 

D.  E.  Muller  and  F.  P.  Preparata 
University  of  Illinois  at  Urbana-Champaign' 

\  ? 

Abstract  1 

Iir  this  paper  we  describe  a  boolean  network  of  size  0(N2  log  N) 
which  accepts  a  fully  parenthesized  N -variable  expression  over  a  given 
semiring  and  produces  its  value  in  0(log  jV)  time.  The  network  con¬ 
sists  of  two  components:  a  preprocessor  and  a  universal  evaluator.  The 
preprocessor  computes  (he  destinations  of  the  expression  terms  and 
routes  them  to  the  correct  input  terminals  of  the  universal  evaluator. 
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1.  Introduction 

The  evaluation  of  tree-structured  expressions  is  a  fundamental  computa¬ 
tion  encountered  in  several  problems.  The  feasibility  of  parallel  computing 
has  attracted  considerable  research  interest  to  the  restructuring  of  expres¬ 
sions  -  typically  arithmetic  expressions  -  to  speed  up  their  evaluation.  While 
restructuring  for  parallel  evaluation  has  been  the  main  objective  for  some 
time[BB68,M71,MP71,B73,BKM73,B74,KM75,MP?6vPAf70|/  more  recently 
attention  has  focussed  on  the  parallelization  of  the  actual  evaluation  starting 
from  the  original  expression  itself.  Several  algorithms  have  been  recently  pro¬ 
posed  for  implementation  on  the  P-RAM  model[B¥$5,MR85,GR86,CV87, 
GV8&-,  KD88f;  the  most  efficient  of  these  algorithms  achieve  time  0(log  N) 
for  an  ^-variable  expression,  and  are  either  optimal,  0(N/ log  JV),  or  near- 
optimal,  0(N),  in  the  number  of  processors  used.  > 

1  This  work  was  supported  in  part  by  NSF  Grant  CCR-87-03807  and  by  the  Joint 
Services  Electronics  Program  under  Contract  N00014-84-C-0149. 
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The  evaluation  of  an  expression  in  parallel  could  be  carried  out  as  the 
combined  execution  of  the  restructuring  and  evaluation  tasks.  Indeed,  in 
this  paper  we  propose  a  method  consisting  of  two  cascaded  phases,  i.e.  the 
restructuring  of  the  expression  followed  by  its  evaluation.  The  adopted  frame¬ 
work  is  the  boolean  network  model.  Specifically,  our  network  consists  of  two 
components:  a  universal  evaluator,  i.e.  a  network  designed  to  carry  out  the 
evaluation  of  any  expression  with  at  most  /V  variables,  ?nd  a  preprocessor, 
designed  to  compute  the  assignments  to  the  terminals  of  the  universal  eval¬ 
uator  of  the  variables  and  connectives  of  the  given  expression.  Our  results, 
which  combine  Theorems  1,  4,  5,  and  6  in  this  paper,  are  summarized  by  the 
following  theorem: 

Theorem  A:  Av  N -variable  expression  E  over  a  semiring  can  be  re¬ 
structured  and  evaluated  in  time  0( log  A’)  by  an  0(N2  log  N) -size  boolean 
network. 

Although  the  term  “semiring”  implies  that  the  two  operations  ”  and 

may  have  no  inverses,  the  present  scheme  permits  the  inclusion  of  their 
inverses  ”  and  M-j-”  in  the  calculation  if  they  do  exist. 

A  result  with  analogous  time  performance,  but  based  on  an  entirely  differ¬ 
ent  approach,  has  been  developed  by  Buss,  Cook,  Gupta,  and  Ramachandran 
[private  communication] . 

2.  Expressions 

Let  E  be  an  expression  over  a  semiring,  where  all  variables  are  assumed  to 
be  distinct.  Such  an  expression  will  be  called  a  primitive  expression.  Thus, 
we  require  only  that  the  algebraic  structure  to  which  the  variables  belong 
has  two  associative  operations,  conventionally  and  such  that 
distributes  over  +”,  and  that  there  is  the  additive  identity  0.  Obviously, 
this  class  includes  rings,  fields,  distributive  lattices,  and  boolean  algebras. 
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For  concreteness  of  presentation,  we  assume  E  to  be  an  expression  over  the 
field  of  rationals  (i.e.  the  operators  are  and  and  their  inverses),  but 
specializations  to  other  cases  are  straightforward  (nowhere  will  commutativ¬ 
ity  be  invoked). 

Expression  E  is  thought  of  as  defining  a  computation  tree  T(E),  and 
is  given  as  a  fully  parenthesized  string,  where  (i)  a  variable  a  is  an  atomic 
expression  (a),  and  (ii)  given  two  expressions  Ex  and  E2,  the  string  (EX^E2) 
is  an  expression  with  7  €  {+,  •,  — ,  -j-}. 

Example:  ((((a1)71(a2))72(a3))73((a4)74(a5)))75(a6))76(a7))  is  a  fully  paren¬ 
thesized  expression,  with  variables  {alt a2, 03, a4, as, a6, a7}  and  operators 
(7i)  72i  73, 74,  75,  7s}-  It  is  a  trivial  exercise  to  show  that  an  expression  with 
N  variables  (and  N  —  1  operators)  has  4JV  —  2  parentheses,  i.e.  a  total  of 
6 N  —  3  symbols. 

A  term  of  an  expression  is  either  a  variable  or  an  operator.  Note  that  a 
variable  occurs  between  two  facing  parentheses  “(  )”  and  an  operator  between 
two  opposing  parentheses  “)  (”.  The  label  A(a)  of  a  term  a  is  its  level  in  T{E), 
i.e.  the  number  of  edges  in  the  path  between  the  root  and  the  node  of  the 
term  itself  (thus,  the  root  has  label  0).  It  is  easily  seen  that  the  label  of  a 
term  is  given  by:  (number  of  left  parentheses  to  its  left)  -  (number  of  right 
parentheses  to  its  left)  -  1.  Thus,  if  we  associate  the  integers  +1  and  —1  with 
each  left  and  right  parenthesis  of  E,  respectively,  then  the  labels  of  the  terms 
are  obtained  by  subtracting  1  from  the  prefix  sums  over  the  subsequence  of 
parentheses  of  E.  It  is  well-known  (see,  e.g.[LF80])  that  such  prefix  sums  can 
be  computed  for  an  N- variable  expression  by  an  0(Ar)-node  tree  network  in 
time  O(log  N). 

3.  The  Universal  Evaluator. 

The  universal  evaluator  network  is  based  on  a  restructuring  scheme  due 
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to  Brent[B73],  which  we  now  review. 

The  variables  of  an  expression  are  assumed  to  be  of  two  kinds:  atomic 
variables  aj,...,ar  and  free  variables  x\,...,xa.  An  expression  is  referred 
to  as  an  A-  or  ^-expression  depending  upon  whether  or  not  it  contains  free 
variables,  respectively.  Normally  we  will  use  the  letter  “E”  for  E-expressions 
and  the  letter  “A”  for  A-expressions.  The  weight  “|  |”  of  an  expression  is  the 
number  of  atomic  variables  it  contains.  The  symbols  6  and  <f>  will  be  used  to 
represent  the  operator  that  occurs  at  level  0  in  the  tree  corresponding  tc  an 
E-  and  an  A-expression,  respectively. 

Given  a  E-expression  E  with  N  variables,  2-7-1  <  N  <  2J  —  1,  the  break¬ 
point  of  E  is  a  unique  node  v  of  T(E),  such  that  the  expression  associated 
with  the  si  btree  rooted  at  v  is  ( E'OE "),  with  max(|E'|,  |E"I)  <  2J~X  —  1,  and 
\E'\  4-  \E"\  >  2J~l.  If  we  excise  T(E'0E")  from  T(E)  and  replace  it  with  a 
free  variable  x  (see  Figure  1),  we  obtain  the  tree  of  a  A-expression  A  o  x  with 
|A|  =  |E|  -  (|E'|  +  |E"|)  <  N  —  2J~l  <  2J~X  -  1.  We  call  A  o  (E'OE")  the 
canonical  decomposition  of  E. 

Analogously,  given  a  A-expression  A  ox  with  N  atomic  variables,  2J~l  < 
A  <  2J  —  1,  the  breakpoint  of  A  o  x  (a  (^-breakpoint)  is  a  unique  node  v  of 
T(Aox)  such  that  the  expression  associated  with  the  tree  rooted  at  v  is  either 
((A'  o  x)<(>E )  or  ( E<f>{A '  o  x)),  with  |A'|  <  2J~X  -  1,  and  |A'|  -f  |£|  >  2J~X. 
Again,  if  we  excise  from  T(A  o  x)  the  subtree  rooted  at  v  and  replace  it 
with  a  free  variable  y  (see  Figure  1),  we  obtain  an  A-expression  (A"  o  y), 
with  \A"\  =  |A|  —  (|A'|  4-  \E\)  <  N  —  2J~X  <  2J~l  —  1  (see  Figure  lb).  We 
call  A"  o  ((A'  o  x)<j>E),  or  A"  o  ( E<f>{A '  o  x))  as  appropriate,  the  canonical 
decomposition  of  A  o  x. 

Brent’s  scheme  is  based  on  the  following  standard  forms  for  E-  and  A- 
expressions: 

E 2  A21X  4-  A22 
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where  Ex,  E 2,  An,  A12,  A2i,  A22  are  division-free  expressions.  An  ^-expression 
E  is  given  by  the  pair  (Ex,E2)  and  an  A-expression  A  o  x  by  the  quadru¬ 
ple  (Au,  A12,  ^21,^22)-  (Notice  that  for  division-free  arithmetic  expressions, 
E2  =  A22  =  1  and  A21  =  0).  The  canonical  decomposition  E  =  A  o  ( E'OE ") 
yields: 

Ex  =  numerator  of  ^An  +  Anj  (1) 

E2  =  numerator  of  ^A2]  +  A22^ 

with  |£|  <  2J  -  1  and  \E'\,  \E% \A'\  <  2J~l  -  1. 

Similarly,  the  canonical  decomposition  A  o  x  =  A"  o  ((A'  o  x)<j>E)  yields: 

Anx  +  A12  =  numerator  of  ^A'/j  +  ^12  j  (2) 

A2jx  +  A22  =  numerator  of  ^ A21  ^  <f>^j  +  ^22) 

with  |A|  <  2J  —  1  and  |A'j,  |A"|,  <  2J_1  — 1,  \E\  <  2J  —  1.  Notice  that  Relations 
(1)  and  (2)  involve  no  division;  this  and  relation  E  =  E\j E2  indicate  that 
the  evaluation  of  E  involves  a  single  division  at  the  end. 

These  conditions  on  the  weights  of  the  terms  of  the  decomposition  readily 
establish  that  the  depth  of  the  restructured  trees  for  both  E  and  A  are  0(  J). 
Moreover,  the  above  decompositions  yield  the  structures  of  universal  evalua¬ 
tors.  We  shall  use  script  letters  and  “A”  to  denote  universal  evaluators 
for  E-  and  A-expressions,  respectively.  (Note  the  distinction  between  an  ex¬ 
pression  -  letters  “IS”  and  “A”  -  and  the  corresponding  universal  evaluator 
networks  -  letters  “5”  and  “A”.)  It  must  be  underscored  that  a  universal 
evaluator  network  is  itself  described  by  an  expression;  however,  the  universal 
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Figure  2:  Recursive  structures  of  universal  evaluators  of  E-  and  A- 
expressions. 


evaluator  suited  for  all  expressions  of  weight  not  exceeding  a  fixed  bound  B 
has  substantially  larger  weight  than  B,  as  we  shall  see  shortly.  Specifically, 
symbol  £<:^  denote  the  universal  evaluator  suited  for  an  expression  E  such 
that  2J_1  <  j jE7|  <  2J  —  1.  Analogously,  we  define  .  The  structures  of 
and  A^  are  shown  in  Figure  2,  where  6-  and  (^-combiners  are  respectively 
fixed-size  modules  implementing  computations  (1)  and  (2).  Denoting  by  e, 
and  a:  the  numbers  of  input  terminals  of  networks  c  and  A<J\  respectively, 
we  readily  have  the  following  recurrence  relations: 


/  —  2ej—  i  +  a.] -i 

\  aj  =  2 aj~\  + 


-i  +1 
+  1 


which,  combined  with  the  initial  values  e\  =  1  and  aj  =  2  yield 


.  _  (4J-1)  *(4>-l)  f  .  , 

^ - »  aj  =  2 - - - for;  =  1,2,...,. 
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Figure  3:  Biasing  of  .4*1)  to  evaluate  ( x<j>a ). 

Letting  J  =  [log27V],  ej  =  (4n°gjWl  —  l)/3  =  Q(N 2)  is  the  number  of 
input  terminals  of  the  universal  evaluator  for  ^-variable  expressions,  with 
2J_1  <  N  <  2J  -  1. 

It  is  a  simple  exercise  to  verify  that  £^  for  an  input  variable  a  is  a  trivial 
circuit  consisting  of  a  straight  wire  carrying  the  variable  and  of  a  second 
output  with  the  constant  bias  1.  Network  .4^  for  an  /^-expression  ( x<f>a ), 
consisting  of  a  single  atomic  variable  a  and  an  operator  <f>,  is  realized  by  the 
biasing  of  the  standard  </>-combiner  shown  in  Figure  3.  It  is  immediate  to 
verify  that  the  default  biasings  of  the  terminals  of  the  universal  evaluator  are 
value  “0”  for  a  variable  terminal  and  operator  for  an  operator  terminal. 

We  now  consider  the  latency  (i.e.,  the  computation  time)  of  the  universal 
evaluator.  Let  and  respectively  denote  the  latencies  of  £^ 

and  of  A(j)  and  let  r ^  and  r9  respectively  denote  the  delays  of  the  <t>-  and 
0-combiners.  From  the  schemes  of  Figure  2  we  deduce: 

f  6{£W)  =max(«(^-1J),^'-1l))  +r* 

\  6{AM)  =max(6(^>-1)),6(^-1)))  +r*. 


We  readily  have  <5(,4^)  =  max(<5(4^  x*),  6(£^)  -(-  t9,  6(A^J  x*)  +  t9)  = 

max(<5(£('7-1)),  <5(.4(j-1'))-f-  t9  -t-r*  ,  which  yields  =  <5(£(j))  +r^  and 


<5(£^)  =  6(A^J  *')  +  Tg  =  6(£(J  +  +  From  the  condition  ^t'*1*)  =  0 

we  have 

=  (J  -  1)(T,  +  Tg)  =  (flogNl  -  l)(r,  +  Tg). 

(Here  and  in  what  follows,  all  logarithms  are  assumed  to  be  base  2.)  We  also 
note  that  delays  can  be  trivially  inserted  so  that  £ ^  can  be  used  in  pipeline 
fashion. 

We  summarize  the  preceding  discussion  with  the  following  theorem: 

Theorem  1.  The  universal  evaluator  for  an  N- variable  expression  has  depth 
0(\ogN)  and  size  0(N2). 

4.  Parallel  restructuring 

In  this  section  we  discuss  the  overall  scheme  whereby  the  terms  of  a  given 
expression  E  (variables  and  operators)  are  applied  to  the  terminals  of  the 
universal  evaluator  ^noglEu.ll)  A  flow-diagram  of  the  process  outlined  in 
the  introduction  is  given  in  Figure  4.  The  expression  E  is  applied  at  first 
to  the  preprocessor ,  which  computes  for  each  term  a  of  E  an  integer  p(a), 
called  the  assignment  of  a.  Subsequently,  for  each  term  a,  the  pair  (p(a),a) 
is  formed.  The  router  directs  term  a  to  the  terminal  of  index  p(a)  of  the 
universal  evaluator.  For  brevity,  the  same  denotation  will  be  used  for  a  term 
of  E  and  for  the  corresponding  node  of  T(E). 

Definition.  Let  w  be  a  node  of  T(E).  and  Ew  the  expression  represented 
by  the  subtree  of  T(E)  rooted  a*  t u.  In  T(E)  a  left  (resp.  right)  ancestor 
of  a  node  v  is  any  node  such  that  v  belongs  to  its  right  (resp.  left)  subtree. 
For  v  in  T(EW ),  we  define  ^(e|ic)  -  the  assignment  of  v  relative  to  w  -  as  the 
index  of  the  terminal  of  £(n°g|£»n)  which  term  v  should  be  applied.  We 
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Figure  4:  Flow  diagram  of  the  overall  restructuring/evaluation  process, 
also  define  p(v\root(T(E)))  =  p(v),  the  (absolute)  assignment  of  v. 

We  now  analyze  the  calculation  of  p(  ).  The  structure  of  the  universal 
evaluator  for  an  expression  with  at  most  ( 2J  —  1)  variables  exhibits  the 
following  properties:2 

(i) .  In  the  canonical  decomposition  Ao(E'6E"),  the  component  expressions 

appear  in  the  left-to-right  order  E E",  A  ox.  Moreover,  we  assume 
\E'\  >  \E"\. 

(ii) .  In  the  canonical  decomposition  A"  o  (E4>(A'  o  x))  or  A"  o  (( A!  o  x)<frE), 

the  component  expressions  appear  in  the  left-to-right  order  A!  o  x,  E, 
A"oy. 

Lemma  1.  If  the  orders  of  the  positions  of  the  terms  in  E  and  of  their 

assignments  to  the  terminals  of  are  to  be  consistent,  then  the  original 

2  If  the  operations  are  not  commutative,  then  we  introduce  new  denotations  for  the 
original  operations  with  the  operands  in  reverse  order. 
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computation  tree  T(E )  must  be  rearranged  to  satisfy  the  following  condi¬ 
tions: 

1.  Each  free  variable  is  the  left  child  of  its  parent; 

2.  At  each  ^-breakpoint  the  A-expression  is  associated  with  the  left  sub¬ 
tree. 

3.  At  each  (^-breakpoint  the  heavier  subexpression  E'  is  associated  with 
the  left  subtree. 

Proof:  We  say  that  a  free  variable  is  terminal  if  it  was  generated  by  the 
removal  of  an  expression  of  the  form  ( E'OE ").  Proof  of  (1):  If  the  free 
variable  x  is  terminal,  then  by  Property  (i)  the  expression  (E'OE")  is  to  the 
left  of  A  oi,  which  implies  that  each  term  of  A  is  to  the  right  of  (E'OE"),  i.e., 
a:  is  a  left  child.  If  the  free  variable  y  is  not  terminal,  then  by  Property  (ii), 
the  expression  E<f>(A'ox)  -  or  (A'ox)(f>E  -  appears  to  the  left  of  A"oy ,  which, 
again  implies  that  y  is  a  left  child.  Proof  of  (2):  An  immediate  consequence 
of  Property  (ii),  by  which  (A'  o  x)  appears  to  the  left  of  E.  Proof  of  (3): 
Trivial,  by  Property  (i).  □ 

We  claim  that  the  tree  T'(E)  obtained  by  rearranging  the  original  T(E) 
in  order  to  comply  with  Conditions  (1),  (2),  and  (3)  above,  the  left  subtree  of 
each  internal  node  is  at  least  as  heavy  as  the  right  subtree.  This  is  trivially 
so  for  a  (^-breakpoint.  For  a  ^-breakpoint,  the  left  subtree  is,  by  (2),  an 
A-expression  of  the  form  A!  o  x.  We  distinguish  two  cases:  (a)  |A'|  =  0,  i.e., 
the  A-expression  reduces  to  a  free  variable  x.  In  this  case  x  is  terminal  and 
originated  by  a  decomposition  of  the  form  Ao(E'0E")  with  \E'\  +  \E"  I  >  \A\. 
Since  the  left  subtree  of  the  parent  of  x  in  T'(E)  has  weight  at  most  |A|,  the 
claim  holds,  (b)  |A'|  >  0.  In  this  case  we  have  the  situation  illustrated  in 


x  is  terminal 


x  is  nomeiminal.y  is  terminal 


Figure  5:  Illustration  of  the  chaining  of  free  variables. 


Figure  5.  If  x  is  terminal,  then  the  expression  of  the  tree  rooted  at  node  is 
(A'oE',)<f>E,  with  |£*|  >  |A'|  +  |J?|  (by  the  selection  of  ^-breakpoints)  and  thus 
\A'\  -f  | IT*  |  >  |£J|.  If  x  is  not  terminal,  then  there  is  a  A-expression  (A*  o  y), 
with  y  terminal,  such  that  the  expression  of  the  tree  rooted  at  node  is 
A'o(A*oEm)<t>E,  with  |£*|  <  |A'|+|A*|+|£|,  and  thus  |A'|  +  |A*|  +  |£*|  >  \E\. 
This  establishes  the  claim. 

Referring  to  the  rearranged  tree  T'(E),  for  each  node  v  we  define  two 
companion  nodes  r(v)  and  l(v)  as  follows: 

r(u):  the  farthest  right  ancestor  of  v  such  that  each  node  in  the  path  from 
v  to  r(v)  is  a  right  ancestor  of  v.  (Note,  r(v )  always  exists  and  may  coincide 
with  v .) 

l(v)\  the  closest  left  ancestor  of  u,  i.e.  l(v)  is  the  parent  of  r(v).  (Note, 
l(v)  may  not  exist,  in  which  case  we  say  l(v)  =  e,  the  empty  node.) 

We  then  have 

p(u)  =  p(u|r(t’))  +  p(l(v)). 

Indeed  v  is  on  the  leftmost  path  of  the  computation  tree  T(Er(v)),  and 


p(i;|r(i>))  is  the  assignment  term  v  would  receive  in  T(Er(v));  moreover,  v 
is  in  the  right  subtree  of  l(v),  so  that  the  assignment  of  v  must  be  offset  by 
p(l(v)).  Iterating  the  above  formula  we  obtain 


i 


p(v)  =  p(v\r(v ))  +  5Z  p(w\r(w ))  (4) 

A(v) 

where  A(u)  is  the  set  of  left  ancestors  of  v  in  T\E ). 

Formula  (4)  reduces  our  problem  to  the  calculation  of  p(v|r(u>))  for  an 
arbitrary  v  in  E.  The  value  of  p(t>|r(y))  depends  exclusively  on  the  weights 
L(v)  and  R(v)  of  the  left  and  right  subtrees  of  u,  respectively.  Indeed,  in  the 
f  semiclosed  interval  (ma x(L(v),  R(v)),  L(v)  +  i2(u)]  there  is  a  unique  integer 

of  the  form  p2q  with  largest  value  of  q.  Let  164_2  . . .  6?+il0  ...  0  be  the  binary 
spelling  of  p2q. 

Suppose  at  first  that  p  =  1  (i.e.,  s  —  1  =  q).  Term  v  is  a  ^-breakpoint, 
|  and  its  terminal  of  the  universal  evaluator  occurs  immediately  to  the  right 

of  a  subnetwork  £**-1),  i.e.,  p(v)  —  e4_!  +  1. 

When  p  >  1,  term  v  is  a  (^-breakpoint,  and  its  terminal  of  the  universal 
evaluator  occurs  after  the  following  sequence  of  subnetworks  (refer  to  Figure 
■  2): 

1.  a  terminal  for  a  ^-operator,  £f3-1)  (for  a  total  of  2e4_!  +  1 
terminals). 

2.  For  j  =  s  —  2  down  to  q  4-  1:  a  terminal  for  a  (^-operator,  S^+1\ 

(for  a  total  of  a:  +  eJ+i  4-  1  terminals)  if  and  only  if  bj  =  1. 

3.  A(ql 

% 

Therefore 

5-2 

/>(v|r(u))  =  (2e,_i  +  1)  +  £  bi(ai  +  ej+ i  +  1)  +  a,  +  1 

i=?+i 


i 
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Using  equalities  (3)  we  obtain 


n 


I 


p(u|r(n)) 


4,-1  _  1  £-2  4?  _  1 

2  — - +  1+  E  &j2  •  4J  +  2  •  — - - 1-  1 

J  i=i+ 1  J 

2X>  +  E  2ij  •  4^  +  2 E  4J  +  2 

i=o  j=?+i  i=o 

41-1  -  E  4 j  —  1  +  E  26j  •  4J  +  2  E  4J  +  2 

i=o  >=?+i  i=o 

4J_1  +  E  (26i  -  1)4J  ~  4?  +  E  4J  +  1 

j=9+l  j=o 

E(2Cj  -  1)4-'  +  1  (5) 

j=o 


where  c,_icJ_2  . . .  c0  is  the  binary  spelling  of  p2q  —  1. 

From  a  computational  viewpoint  we  note:  If  mr_  1 . . .  m0  and  sr_j  . . .  s0 
are  the  binary  spellings  of  max(Z<(u),  R{v))  and  of  L(v)  +  R(v),  respectively, 
then  (s  —  1)  is  the  largest  value  of  j  for  which  rrij  =  Sj  =  1,  and  q  is  the 
largest  value  of  j  for  which  rrij  =  0  and  3j  =  1.  Thus  in  time  O(log  N)  we 
can  obtain  cs_ic,_2  . . .  cq.  To  obtain  /?(v|r(u))  from  this  number  we  perform 
the  following  transformation 


Ai)  Ai)  Ai) 

a2]+7a2j+\a2j 


-( 


1 1 1 
00  1 


if  Cj  =  0 
if  Cj  =  1 


and  add  modulo-  2  bit-by-bit  the  equally-indexed  bits  of  the  corresponding 
s  binary  numbers. 

The  calculation  of  the  subtree  weights  L(v)  and  R{v)  is  the  topic  of 
Section  5.  In  Section  6  we  shall  address  the  question  of  the  distribution  of 
the  offsets  p{w\r(w ))  for  w  €  A(u).  Finally,  Section  7  will  discuss  the  routing 
of  v  to  the  terminal  of  £ ^  indexed  p(v). 
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5.  Calculation  of  Subtree  Weights 

The  expression  E  consists  of  a  sequence  of  parentheses,  variables,  and 
operators.  We  shall  write  E  to  denote  the  subsequence  of  E  formed  by 
erasing  the  parentheses,  leaving  just  the  operators  and  variables.  We  note 
that  the  members  of  E  are  in  one-to-one  correspondence  with  the  nodes  of 
the  tree  T(E),  and  we  have  seen  at  the  end  of  Section  2  that  a  label  A(u), 
representing  its  level  in  T(E),  may  be  found  for  each  such  node  v. 

Let  v  be  a  variable  in  the  expression  E,  so  that  v  corresponds  >,o  a  leaf 
of  T(E).  There  is  a  unique  path  from  the  root  of  T(E)  to  v.  We  write  this 
path  as  vQ,  v\,  . . . ,  vp  where  Vo  is  the  root  of  T(E)  and  vp  =  v.  For  each  node 
Vi  in  the  path  A(i\)  =  i,  and  if  i  is  less  than  p,  then  u,  is  an  operator. 

We  now  wish  to  investigate  the  properties  of  the  subsequence  of  E  corre¬ 
sponding  to  the  path  v0,Vi, . . .  ,vp  of  T(E).  We  begin  with  the  following: 

Lemma  2:  In  the  sequence  E,  for  any  i  <  p  there  is  no  element  between 
Vi  and  vp  whose  level  is  less  than  i.  Furthermore,  if  x  is  any  operator  or 
variable  of  some  level  i  <  p  such  that  every  element  between  x  and  vp  is  of 
level  greater  than  z,  then  x  must  be  v,. 

Proof:  The  variable  vp  is  either  in  the  right  or  the  left  subtree  of  T(EVi)  (cf. 
Definition  1).  Let  us  assume  first  that  it  is  in  the  left  subtree  so  that  vp  is 
to  the  left  of  v,  in  E.  Then  there  is  a  subsequence  of  E  of  the  form  ( E'viE ") 
such  that  vp  occurs  in  E'.  All  the  operators  and  variables  in  E'  have  level 
greater  than  i  so  we  may  conclude  that  all  elements  of  E  between  vp  and  v, 
have  level  greater  than  i.  Furthermore,  if  we  assume  that  x  is  some  variable 
or  operator  to  the  left  of  vp  in  E  which  has  level  no  greater  than  i,  and  such 
that  every  element  of  E  between  x  and  vp  is  of  level  greater  than  i,  then  x 
must  be  the  first  operator  immediately  to  the  left  of  (E'viE").  Since  it  must 
have  level  less  than  i  it  cannot  be  v,,  and  the  conclusion  follows.  Second, 
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if  we  assume  that  vp  is  in  the  right  subtree  a  precisely  analogous  argument 
gives  the  same  result.  □ 

Now,  starting  with  vp  we  define  the  first  forward  subsequence  FS(vp) 
of  E ,  starting  with  vp  with  monotone  decreasing  levels  to  consist  of  elements 
dj,  a2, . . . ,  ar  such  that  (i)  a i  =  vp,  (ii)  for  each  a,  before  the  last,  the  element 
at+1  is  the  first  member  of  E  to  the  right  of  a,  having  a  smaller  level  than 
a,,  and  (iii)  there  is  no  element  of  E  to  the  right  of  ar  having  a  smaller  level 
than  ar. 

A  similar  definition  describes  the  first  backward  subsequence  BS(vp)  of 
E,  starting  with  vp  with  monotone  decreasing  levels.  The  only  difference  is 
that  the  word  “right”  replaces  the  word  “left”  in  all  places  in  the  definition. 
The  corresponding  subsequence  has  the  form  1,  •  ■  • ,  &i  and  is  such  that 
(i)  b\  =  vp ,  (ii)  bt+i  is  always  the  first  member  of  E  to  the  left  of  bt  having 
smaller  level  than  6,,  (iii)  there  is  no  element  of  E  to  the  left  of  b,  with 
smaller  level  than  ba. 

Theorem  2:  A  path  vo,vi, . . .  ,vp  on  T(E)  from  the  root  vo  to  a  leaf  vp 
corresponds  to  FS{vp )  and  to  BS(vp). 

Proof:  Using  the  previous  lemma,  we  see  that  the  members  of  the  path  are 
precisely  the  elements  satisfying  the  conditions  for  being  in  either  the  first 
forward  or  first  backward  subsequence  starting  with  vp.  □ 

If  v  is  a  node  of  the  tree  T(E)  which  is  not  a  leaf,  then  it  represents  an 
operator.  The  subtree  T(EV)  of  T(E)  associated  with  v  corresponds  to  an 
expression  of  the  form  ( E'vE ”).  The  weights  L(v)  =  \E'\  and  R(v)  =  \E"\  are 
the  weights  of  the  left  and  right  subtrees  of  T(EV)  respectively.  Our  objective 
is  to  develop  a  procedure  to  determine  L(v)  and  R(v)  for  each  operator  node 
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Figure  6:  Circuits  to  compute  (L(v),  R(v)). 


v  of  T(E)  which  can  be  carried  out  in  time  O(logiV). 

The  circuit  to  compute  L(v)  and  R(v)  contains  a  binary  tree  correspond¬ 
ing  to  each  variable  w  in  E,  a  typical  one  of  which  we  shall  call  Tw.  The 
edges  in  the  tree  are  serial  transfer  paths  and  the  leaves  receive  the  members 
a  of  E,  with  the  exception  of  w,  in  their  given  order  (see  Figure  6).  The 
label  A(a)  of  each  member  a  of  E  is  also  applied  to  each  corresponding  leaf 
of  Tw.  These  labels  are  also  transmitted  to  the  internal  nodes  of  Tw  in  such  a 
way  that  each  internal  node  n  receives  the  minimum  label  of  all  the  leaves  in 
its  subtree.  This  is  accomplished  by  sending  messages  from  the  leaves  to  the 
root  of  Tw  so  as  to  give  each  internal  node  fi  the  minimum  label  received  by 
its  two  children.  This  operation  can  clearly  be  carried  out  in  time  O(log  N) 
giving  each  node  fi  of  Tw  a  label  6  =  A  (fi)  by  endowing  each  node  with  a 
one-bit  comparator  and  feeding  the  labels  most-significant-bit  first. 

Two  messages  called  tokens  are  sent  from  the  root  of  Tw  along  the  left  and 
right  edges  of  Tw  connected  to  the  root.  These  tokens  have  the  form  ( L,6 ) 
and  (R,  <5),  where  L  signifies  “left”  and  R  “right”  and  S  is  the  label  A(tu).  We 
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new  trace  the  behavior  of  the  left  token  (L,  6)  as  it  travels  along  the  edges  of 
Tw  towards  the  leaves.  An  analogous  behavior  will  hold  for  (R,6),  but  with 
“left”  and  “right”  interchanged. 

When  starting  out  from  the  root,  if  the  first  node  encountered  by  ( L,6 ) 
(i.e.  the  left  child  of  the  root)  has  label  no  less  <$,  then  the  token  is  erased, 
otherwise,  it  proceeds  as  follows. 

After  leaving  the  root  of  Tw,  the  token  ( L,S )  passes  on  a  path  along  the 
edges  of  Tw  encountering  various  nodes  until  it  finally  comes  to  a  leaf,  where 
it  stops.  We  shall  see  that  this  leaf  of  Tw  must  correspond  to  an  operator  v 
such  that  w  is  in  the  right  subtree  of  T(EV).  It  will  therefore  contribute  1  to 
the  quantity  R(v).  The  way  that  the  token  (L,  6)  chooses  the  correct  path  is 
now  described. 

Since  Tw  is  a  binary  tree,  all  its  internal  nodes  are  of  degree  3  except  the 
root  which  is  of  degree  2.  When  the  token  ( L,S )  reaches  an  internal  node  fi 
of  degree  3  it  takes  either  the  left  or  the  right  descending  branch  according 
to  the  following  rule,  where  the  labels  on  the  left  and  right  children  of  /x  are 
Si  and  SR,  respectively. 

Rule  for  left- token  propagation 

1.  begin  if  Sr  <  S  then 

2.  begin  ( L,S )  proceeds  on  the  right  edge; 

3.  if  Si  <  Sft  then  ( L,Sr )  is  created  and  sent  on  the  left  edge 

4.  end 

5.  else  (L,S)  proceeds  on  the  left  edge 

end 

Note  that  token  ( L,Sr )  created  at  Line  3  follows  the  same  rule  as  the 
former  token  except  that,  since  it  has  a  different  label  (namely  Sr  instead 
of  6),  its  interactions  will  be  correspondingly  different.  An  analogous  Rule 
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holds  for  (R,8)  with  the  following  substitutions:  8r  — ►  8l,R  — +  L,  left  — *• 
right  ,  right  — >  left  . 

Naturally,  all  the  generated  tokens  as  well  as  the  original  two  tokens  are 
strings  of  O(logJV)  bits  that  propagate  simultaneously  in  Tw  in  bit-serial 
fashion.  Since  they  all  move  on  paths  of  the  binary  tree,  they  do  not  retrace 
their  paths  and  hence  the  time  required  is  determined  by  the  sum  of  the 
length  of  their  representation,  which  is  <3(log  N),  and  of  the  maximum  path 
length  from  the  root  to  a  leaf  of  the  binary  tree,  which  is  also  O(log  N ). 

It  remains  to  prove  the  following. 

Theorem  3:  The  leaves  of  Tw  which  receive  tokens  are  exactly  those  which 
correspond  to  operators  on  the  path  of  T(E )  from  w  to  it  root.  Furthermore, 
those  receiving  a  left  token  have  w  in  their  right  subtrees  (are  left  ancestors 
of  w),  and  those  receiving  a  right  token  have  w  in  their  left  subtrees  (are 
right  ancestors  of  w). 

Proof:  We  shall  show  that  those  leaves  in  the  left  subtree  of  Tw  which  receive 
left  tokens  correspond  to  just  the  members  of  BS(w).  An  analogous  result 
applies  to  the  right  subtree.  Then,  using  Theorem  2  the  result  is  proved. 

There  are  three  parts  to  the  proof.  First,  we  show  that  the  original  token 
(L,<5)  goes  to  the  leaf  of  Tw  corresponding  to  the  first  member  of  E  to  the 
left  of  w  having  level  less  than  6.  To  show  this,  we  imagine  that  each  edge 
of  Tw  has  a  label  A  (a)  which  is  the  same  as  the  label  of  the  child  node  a  to 
which  it  connects.  We  now  trace  the  path  from  the  root  of  Tw  in  the  left 
subtree  taken  by  the  original  token  ( L,8 ).  It  can  never  follow  an  edge  with 
label  as  large  as  8 ,  but  otherwise  it  will  always  go  to  the  right  if  possible  until 
it  reaches  a  leaf.  All  parts  of  the  subtree  which  are  to  the  left  of  this  path 
must  have  labels  which  are  greater  than  or  equal  to  8 ,  by  the  mechanism 
that  assign  labels  to  the  internal  nodes  of  Tw.  On  the  other  hand,  the  label 
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of  the  leaf  reached  by  the  token  must  be  less  than  8  since  it  lies  on  the  path. 
This  proves  our  first  assertion. 

Second,  we  show  that  any  new  token  (X,  S')  generated  in  the  process  must 
go  to  one  of  the  members  of  BS(w).  At  the  point  of  its  initiation,  the  token 
{L,8')  cannot  have  passed  to  the  right  of  any  part  of  the  left  subtree  with 
label  less  than  S',  because  the  tokens  from  which  it  was  generated  had  labels 
greater  than  S'.  Following  its  initiation,  the  token  ( L,S ')  follows  a  path  such 
that  all  parts  of  the  subtree  which  are  to  the  right  of  this  path  must  have 
labels  at  least  as  large  as  S'.  Thus,  the  leaf  it  reaches  is  the  first  one  with 
label  less  than  S'  and  is  thus  on  BS( w). 

Third,  we  show  that  every  member  of  BS(w)  must  receive  a  token.  Let 
the  operator  a  with  level  A  be  a  member  of  this  subsequence.  From  the 
definition,  we  know  that  A  <  6  and  that  all  leaves  of  the  left  subtree  of  Tw 
which  occur  to  the  right  of  a  have  level  greater  than  A.  Tracing  a  path  on 
Tw  from  a  to  the  root,  we  see  that  this  entire  path  has  labels  at  most  A,  but 
that  tributaries  to  the  right  of  this  path  have  labels  which  are  greater  than 
A.  Now,  consider  a  token  starting  out  on  this  path.  In  the  beginning  it  has 
label  S  and  as  long  as  the  path  goes  to  the  righf  it  will  retain  this  label  and 
follow  the  path.  Now,  if  the  path  goes  left,  it  either  retains  the  label  8  or  else 
obtains  a  new  label  S'  <  S.  However,  S'  >  A  because  the  path  is  ultimately 
connected  to  the  leaf  of  a.  Thus,  a  left  token  will  ultimately  reach  a. 

Now,  by  Theorem  2:  we  see  that  the  leaves  of  Tw  reached  by  tokens  are 
exactly  those  corresponding  to  those  operators  on  the  path  of  T(E)  from  its 
root  to  the  leaf  corresponding  to  w.  Also,  those  receiving  left  tokens  have  w 
in  their  right  subtrees  and  those  receiving  right  tokens  have  w  in  their  left 
subtrees.  □ 

The  final  steps  for  computing  L(v)  and  R(v)  for  each  operator  v  of  E 
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must  be  carried  out  by  adding  the  numbers  of  left  and  right  tokens  received 
by  each  v.  This  can  be  done  by  using  a  separate  pair  of  adders  for  each 
operator.  Such  an  adder  is  in  effect  a  parallel  counter  using  N  single-bit 
inputs  corresponding  to  the  N  variables,  and  can  be  constructed  as  a  binary 
tree  of  depth  \og2(N).  The  computation  time  is  0(logAr).  In  Figure  6  we 
have  a  global  illustration  of  the  machinery  implementing  the  computation  of 
L(v)  and  R(v)  for  all  internal  nodes  v  of  T(E). 

Combining  the  foregoing  discussion  with  the  results  of  Section  4  on  the 
conversion  of  L(v)  and  /?(u)  to  /)(u|r(u)),  we  have  the  following  theorem: 

Theorem  4.  The  computation  of  the  relative  assignments  p(u|r(u))  for  each 
term  v  of  iV-term  expression  E  can  be  done  by  a  boolean  network  of  size 
0(N 2)  in  time  0(\ogN). 

6.  Distribution  of  the  Offsets 

The  last  step  needed  to  calculate  the  second  term  in  the  expression  for 
p(v),  (formula  (4)),  for  each  vertex  v  of  T(E),  requires  forming  the  sum 
Y  /5(u>|r(u>))  over  all  the  ancestors  w  of  v  in  T(E)  such  that  v  is  in  the 
lighter  subtree  of  w. 

To  carry  out  this  calculation  we  can  use  the  same  structure  that  was  used 
to  calculate  the  weights  L(v)  and  R(v )  and  illustrated  in  Figure  6.  Again, 
we  let  6  =  \(w).  Tokens  of  the  form  (L,  8)  and  {R,6)  are  sent  from  the  root 
of  each  tree  Tw,  but  in  this  case  the  rules  obeyed  by  the  tokens  are  different 
from  those  described  in  Section  4.  (Notice  that  the  token  labels  L  and  R 
are  used  here  to  aid  the  explanation  but  need  not  be  implemented.)  We  also 
assume  that  the  functions  L(w)  and  R( w)  have  already  been  calculated. 

Now,  for  each  node  v,  we  wish  to  add  the  offset  of  w  provided  that  v 
is  in  the  lighter  subtree  of  w.  We  begin  by  comparing  L(w )  with  R(w)  to 
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determine  which  of  these  two  numbers  is  smaller  and  in  this  way  decide  into 
which  of  the  two  subtrees  of  the  root  of  Tw  to  send  the  token. 

In  order  to  send  the  token  to  only  those  leaves  of  Tw  which  correspond  to 
descendants  of  w  in  T(E).  we  must  choose  the  rules  suitably.  For  concrete¬ 
ness,  let  us  assume  that  R(w)  <  L(w),  in  which  case  the  token  (R,  6)  enters 
the  right  subtree  of  the  root  of  Tw.  Then  all  the  leaves  of  Tw  corresponding  to 
nodes  v  of  T(E)  that  must  receive  the  offset  p(w\r(w))  occur  in  a  consecutive 
sequence  at  the  left  of  this  subtree.  Specifically,  this  sequence  of  leaves  is 
bounded  on  the  right  by  a  leaf  p  corresponding  to  a  (right)  ancestor  of  w , 
which  is  associated  with  a  label  A'  <  6.  It  follows  that  the  token  must  be  dis¬ 
tributed  exactly  to  the  left  subtrees  of  the  leaf-to-root  path  in  Tw  originating 
in  p.  Therefore  we  have  the  following  rules: 

(i) .  If  the  descending  token  [R,6)  enters  a  vertex  of  Tw  whose  label  is  less 

then  6,  then  the  token  follows  the  branch  to  the  left  child. 

(ii) .  If  the  descending  token  enters  a  vertex  whose  label  is  not  less  than  6 , 

then  the  token  is  duplicated  and  both  children  receive  ( R,6 ). 

(Note:  (1)  It  is  not  possible  for  the  label  of  a  vertex  reached  by  this  process 
to  be  the  same  as  that  of  w.  (2)  If  the  leaf  p  exists,  there  must  be  at  least 
one  leaf  to  its  left  in  the  right  subtree  of  Tw.  Therefore,  p  can  never  receive 
a  token.) 

An  entirely  analogous  set  of  rules  applies  to  the  original  token  {L,6), 
which  is  created  when  L(w)  <  R{w).  In  this  case  ( L,6 )  is  sent  into  the  left 
subtree  and  the  above  rules  are  used  with  “right”  and  “left”  interchanged  in 
all  places. 

Finally,  all  the  tokens  will  reach  leaves  of  Tw  which  represent  vertices 
in  the  lighter  subtree  of  T(E)  whose  root  is  w.  To  each  of  these  leaves  we 
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attach  the  offset  p(u;|r(u;))  calculated  by  the  method  described  in  Section  4 
and  given  by  Formula  (5). 

The  last  step  consists  of  adding,  for  each  term  v  of  E,  the  offsets  obtained 
from  all  the  trees  Tw.  This  may  be  done,  again  in  time  O(logfV),  by  the 
same  adders  which  were  used  for  computing  the  functions  L(v)  and  R{v), 
as  described  in  Section  5.  (Note  that  in  this  case  each  adder  tree  functions 
as  a  full-fledged  adder  of  O(N)  integers  of  0(log  N)  bits.)  The  result  is  the 
assignment  value  of  the  p(v)  according  to  formula  (4). 

We  summarize  the  discussion  as  follows: 

Theorem  5.  The  computation  of  the  (absolute)  assignment  p(v)  for  each 
term  v  of  an  (V-term  expression  E  can  be  done  by  a  boolean  network  of  size 
0(,V2)  in  time  O(log N). 

We  next  see  how  these  assignments  are  used  to  accomplish  the  routing  of 
the  terms  of  E  to  the  universal  evaiuator. 

7.  Routing  to  the  universal  evaluator 

Once  the  set  of  N  integers  {/j(a)  :  a  a  term  of  E)  is  available,  the  pairs 
( p(a),a )  are  formed  and  supplied  to  a  routing  network,  where  p(a)  functions 
as  the  address  of  record  a.  As  usual,  J  —  [log  N~\. 

Let  02*  (for  an  integer  s)  denote  the  2,-input/2*-output  butterfly  network. 
The  terminals  of  J3?>  are  numbered  from  0  to  2s  —  1  from  left  to  right  and  the 
stages  of  By  are  numbered  from  s  -  1  to  0  from  input  to  output.  Given  an 
integer  r  6  [0,2*  —  1),  we  let  BIT;(r)  be  the  coefficient  of  2J  in  the  binary 
representation  of  r.  Suppose  that  the  (address,  record)  pair  (r,  R)  is  applied 
at  any  input  terminal  of  #2*;  we  say  that  R  is  obliviously  routed  to  output 
terminal  r  if  at  stage  j  record  R  is  routed  on  the  right  or  on  the  left  outgoing 
branch  depending  upon  whether  BIT;(r)  =  1  or  0,  respectively.  We  have 


the  following  lemma: 
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Lemma  3.  Let  (r0,  r2, . . . ,  r-p.j),  p  <  2s,  be  a  sequence  of  distinct  integers 
in  the  range  [0,2J  —  1],  sorted  in  ascending  order.  Pair  (r%,  £,)  is  applied 
to  input  terminal  (c  +  i )  mod  2s  of  B2>  (for  some  fixed  c  £  [0, 23  —  1]),  and 
Rx  is  obliviously  routed.  Then  the  routing  paths  of  the  p  records  are  vertex 
disjoint. 

Proof:  Sequence  (r0, . . . ,  rp_v )  applied  to  £2*  as  in  the  statement  of  the 
Lemma  is  said  to  be  well-positioned  in  B2>.  To  prove  the  lemma,  it  suffices 
to  show  that  the  oblivious  routing  through  Stage  (s  —  1)  is  free  of  collisions  and 
yields  two  sequences  (r0, . . . ,  r*)  and  (rjt+1, . .  . ,  rp_1),  with  rfc  <  2s-1  <  rfc+1, 
which  are  respectively  well-positioned  in  the  left  subnetwork  Z?2.-i  and  right 
subnetwork  B2.-i  that  are  obtained  by  removing  Stage  (s  —  1)  from  B2». 

A  collision  may  occur  only  between  two  elements  of  the  input  sequence 
applied  to  two  terminals  of  B2>  situated  2*_1  positions  apart.  It  is  immedi¬ 
ately  realized  that  no  collision  occurs  for  p  <  24-1,  so  we  consider  p  >  2’~1. 
If  c  <  2s  —  p~  1,  then  BIT,_i(r;)  ^  BIT,_i(r,+2.-i )  (for  any  i  =  0, . . .  ,p—  1), 
for,  otherwise  BIT,_j(r,)  =  BIT,_1(r,+1)  =  . . .  =  BIT,_i(r1+2»-i )  because 
(r0, . .  .  ,rp_i)  is  sorted.  But  this  implies  that  there  are  at  least  24'1  +  1  dis¬ 
tinct  integers  in  [0,24  —  1]  with  identical  most-signficant  bit,  which  is  false. 
An  analogous  argument  holds  when  c  >  2’  —  p—  1,  thus  establishing  the  first 
part  of  the  lemma. 

To  prove  the  second  part,  we  consider  the  case  c  -f  p  —  1  <2*,  the  other 
case  being  analogous.  If  neither  interval  [c,c-f-  k]  or  [c  +  fc-fl,c  +  p—  1] 
contains  24_l  —  1,  then  (r0,  ...i"*)  and  (r^+i , . .  .  rp_i )  are  each  applied  as  a 
single  segment  in  the  left  and  right  half,  respectively.  Otherwise,  one  of  them. 

say  (r0, .  .  .  rk),  is  split  into  segments  (r0, . . . ,  r2»-i_c_! )  and  (r2»-i_c . rk), 

which  are  jointly  applied  to  form  a  well-positioned  sequence  in  the  left  half: 
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the  other  sequence  is  applied  as  a  single  segment  in  the  right  half.  In  all  cases 
we  obtain  well-positioned  sequences  in  the  left  and  right  02»-i  subnetworks. 
□ 


To  carry  out  the  routing,  we  could  sort  the  set  {(p(a),  a)  :  a  a  term  of  E} 
in  ascending  order  by  p(a)  and  apply  the  sorted  sequence  to  the  leftmost 
segments  of  inputs  of  the  appropriate  butterfly  network  for  oblivious  routing. 
The  latter  is  B2V-1,  since,  for  J  >  l,22J~2  <  (4J  — 1)/3  <  22J~1  and  therefore 
each  p(  )  is  in  the  range  [0,22-/_1  —  1].  However,  some  pruning  of  B^j-i  is 
possible,  since  at  most  2J+1  —  3  terminals  are  used  at  the  input  of  Stage 
(2  J  —  2),  2(2J+1  —  3)  at  the  input  of  Stage  (2  J  —  3),  and  so  on,  until  we  reach 
Stage  J,  where  more  than  22J~ 2  inputs  are  used.  In  Stages  (2 J  —  2),  (2 J  — 
3 ),...,  J  we  will  remove  from  B^u-i  all  nodes  that  are  not  reachable  by  any 
of  the  input  terminals  with  index  >  2J+1  —  4.  In  the  subsequent  Stages 
J  —  1,  J  —  2, ...  ,0  we  will  remove  all  nodes  that  are  not  reachable  by  any 
of  the  output  terminals  with  index  >  ej  =  (4J  —  l)/3.  We  leave  it  as  an 
exercise  to  show  that  the  number  of  branching  nodes  of  the  pruned 
is  ((3J  -  \)22J  +  l)/9  +  22J  -  3.2J-1  -  6  =  0(JV2logAT).  The  routing  is 
obviously  accomplished  in  time  O(logiV).  Since  the  preliminary  sorting  can 
be  done  in  time  O(logiY)  by  a  mesh-of-trees  [MP75,L84]  with  0(N2)  leaves 
we  conclude: 


Theorem  6.  Routing  of  the  expression  terms  to  the  terminals  of  the  univer¬ 
sal  evaluator  can  be  done  in  time  0(log  N)  with  equipment  of  size  0(N2  log  N) 
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